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^\ . We generalize the Metropolis et al. random walk algorithm to the situation where the energy 

is noisy and can only be estimated. Two possible applications are for long range potentials and 
0^ , for mixed quantum-classical simulations. If the noise is normally distributed we are able to modify 

the acceptance probability by applying a penalty to the energy difference and thereby achieve exact 
' sampling even with very strong noise. When one has to estimate the variance we have an approximate 

D , formula, good in the limit of large number of independent estimates. We argue that the penalty 

[JL| ' method is nearly optimal. We also adapt an existing method by Kennedy and Kuti and compare to 

the penalty method on a one dimensional double well. 
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I. INTRODUCTION 



Ph. 

As Metropolis et al. showed in 19530, Markov random walks can be used to sample the Boltzmann distribution 
. thereby calculate thermodynamic properties of classical many-body systems. The algorithm they introduced is one of 
the most important and pervasive numerical algorithms used on computers because it is a general methcjd of sampling 
, arbitrary highly-dimensional probability distributions. Since then many extensions have been developed!! In addition 
to the sampling of classical systems, many Quantum Monte Carlo algorithms such as Path Integral Monte CarloB, 
variational Monte CarloQ and Lattice Gauge Monte Carlo use a generalization of the random walk algorithm. 

In a Markov process, one changes the state of the system {s} randomly according to a fixed transition rule, 
V(s — > s'), thus generating a random walk through state space, {sq, s%, S2 ■ ■ ■}■ The transition probabilities often 
satisfy the detailed balance property (a sufficient but not necessary condition). This means that the transition rate 
^vq ' from s to s' equals the reverse rate: 

w(s)V(s s') = -k{s')V{s' -> s). (1) 



> 

in 

Here tt(s) is the desired equilibrium distribution which we take for simplicity to be the classical Boltzmann distribution: 
| 7r(s) oc exp(— U(s)/(fcsT)) where T is the temperature and V(s) is the energy. If the pair of functions {tt(s), V(s — > s')} 
. satisfy detailed balance and if V(s — > s') is ergodic, then the random walk will eventually converge to 7r. For more 
00 ' details see Refs. [§|6). 

0^ . In the particular method introduced by Metropolis one ensures that the transition rule satisfies detailed balance by 
""^5 ' splitting it into an "a priori" sampling distribution T(s — > s') (a probability distribution that can be directly sampled 
such as a uniform distribution about the current position) and an acceptance probability a(s — > s') with < a < 1. 
■ The overall transition rate is: 

j^; V(s -> s') = T(s s')a(s -> s'). (2) 

', Metropolis et al.0 made the choice for the acceptance probability: 



where 



a M (s -> s') = min [1, q(s' -> s)] , (3) 



l( s ^ s ') ^ W ~ t = c M-(V(s') - V(s))/(k B T)). (4) 

7T(S)1 (S — > S') 



Here we are assuming for the sake of simplicity that T(s' — » s) = T(s — > s'). The random walk does not simply 
proceed downhill; thermal fluctuations can drive it uphill. Moves that lower the potential energy are always accepted 
but moves that raise the potential energy are often accepted if the energy cost (relative to fc^T = 1//3) is small. Since 
asymptotic convergence can be guaranteed, the main issue is whether configuration space is explored thoroughly in a 
reasonable amount of computer time. 

What we consider in this article is the common situation where the energy, V(s) needed to accept or reject moves, 
is itself uncertain. This can come about because of two related situations: 
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• The energy may be expressed as an integral: V(s) = J dxv(x, s). If the integral has many dimensions, one might 
need to perform the integral with another subsidiary Monte Carlo calculation. 

• The energy may be expressed as a finite sum: V(s) = efe(s) where is N is large enough that performing 
the summation slows the calculation. It might be desirable for the sake of efficiency to sample only a few terms 
in the sum. 



A. Mixed Quantum-Classical Simulation 

First, consider the typical system in condensed matter physics and chemistry, composed of a number of classical 
nuclei and quantum electrons. In many cases the electrons can be assumed to be in their ground state and to follow 
the nuclei adiabatically. To perform a simulation of this system, we need to accept or reject the nuclear moves based 
on the Born-Oppenheimer potential energy Vbo(s), defined as the eigenvalue of the electronic Schrodinger equation 
with the nuclei fixed at position s. In most applications, this potential is approximated by a semi-empirical potential 
typically involving sums over pair of particles. More recently, in the Car-Parrinello molecular dynamics methodQ, one 
performs a molecular dynamics simulation of the ions simultaneous with a solution of the electronic quantum wave 
equation. To be feasible one uses a mean field approximation to the full many-body Schrodinger equation using the 
local density functional approximation to density functional theory or a variant. Others have proposed coupling a 
nuclear Monte Carlo random walk to an LDA calculation^. Although mean-field methods such as LDA are among 
the most accurate methods fast enough to be useful for large systems, they also have known deficiencies^. 

We would like to usej-a quantum Monte Carlo (QMC) simulation to calculate Vbo(s) during the midst of a classical 
MC simulation (CMC)E3. QMC methods, though not yet rigorous because of the fermion sign problem, are the most 
accurate methods useful for hundreds of electrons. But QMC simulation will only give an estimate of Vbo(s) with 
some statistical uncertainty. It is very time consuming to reduce the error to a negligible level. We would like to take 
into account the statistical error without having to reduce it to zero. 

Note that we do not wish the new Monte Carlo procedure to introduce uncontrolled approximations because the 
goal of coupling the CMC and QMC is a robust, accurate method. We need to control systematic errors. It has 
been noticed by Doll and FreemarO, after studying a simple example, that CMC is robust with respect to noise but 
recommend using small noise levels and small step sizes to minimize the systematic errors. However, this can degrade 
the overall efficiency. If we can tolerate higher noise levels without introducing systematic errors, the overall computer 
algorithm will run faster and more challenging physical systems can be investigated, e.g. more electrons and lower 
temperatures 

B. Long-range potentials 

In CMC with a pair potential, to compute the change in energy when particle k is moved to position v' k , one needs 
to compute the sum 

JV 

AV(r' k ) = Y,Mr' kj )-v(r kj )]. (5) 
j=i 

This is referred to as an order TV 2 algorithm since the computer effort to move all particles once is proportional to 
N 2 . If the interaction has a finite range, neighbor tablesEj will reduce this complexity to order N. However charged 
systems with Coulomb interactions are not amenable to this treatment. Usually the Ewald image method is used to 
handle the long-range potentials with a complexity!! 3 ! of order iV 3 / 2 . The fast multipole methocO, which scales as N 
for the Coulomb interaction is not applicable to Monte Carlo since that method computes the total energy or force 
and in MC we need the change in potential as a single particle is moved. 

The challenge is to come up with an order N Monte Carlo method for charged systems. In the Ewald method, the 
potential is split into a short-range part and a long-range part: 

v(r) = v s (r) + vi(r). (6) 

The short ranged part is a finite ranged and can be handled with neighbor tables, the long range part is usually 
expanded in a Fourier series, at least in periodic boundary conditions and is bounded and slowly varying. We suggest 
that it is possible to estimate the value of vi(r) by sampling either particles at random, or terms in its Fourier 
expansion. The question that arises is how to compensate for the noise of the estimate in AV. 
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In both of these examples one could simply ignore the effect of fluctuations in the estimate of AV(s). If the errors 
are small then clearly the sampled distribution will be changed only a little. If the acceptance ratio as a function of 
AV(s) were a linear function there would be no bias, but because it is non-linear, fluctuations will bias the asymptotic 
distribution. In this paper we will make a conceptually simple generalization of Metropolis algorithm, by adjusting 
the acceptance ratio formula so that the transition probabilities are unaffected by the fluctuations in the estimate of 
AV(s). We end up with a completely rigorous formula in the sense that if one averages long enough, one will get 
the exact distribution, even if the noise level is large. The only assumption is that the individual energy estimates 
are independently sampled from a normal distribution whose mean value is AV(s). One complication is that the 
estimates of the variance of Wis) are also needed. We show how to treat that case as well. 

Kennedy, Kuti and BhanotOiij introduced an algorithm with many of the same aims as the present work but for 
computations in lattice gauge theory. We will describe their method and compare it to the new method later in the 
paper. 



II. DETAILED BALANCE WITH UNCERTAINTY. 



From the two examples discussed above, let us suppose that when a move from s to s' is made, an estimate of the 
difference in energy is available, which we denote S(s — > s'). (We often take units with fcgT = 1 hereafter.) By V(s) 
we mean the true potential energy. Let a(s — > s') be a modified acceptance probability; we assume that it depends 
only on the estimate 5 of the energy difference. Let P(S; s — > s')dS be the probability for obtaining a value 8. Then 
the average acceptance ratio from s to s': 

poo 

A{s^s')= dSP{5;s^ s')a{6). (7) 



The detailed balance equation is: 
Defining: 

A(s -» s') = [V(s') - V{s)]/k B T - Inpy - s)/T(s -» s')} (9) 

we can rewrite the detailed balance equation as: 

A(s -> s') = e- A A(s' s). (10) 

If the process to estimate S is symmetric in s and s' then P(S;s' — > s) = P(—S;s — > s'). Then detailed balance 
recmires: 



f 

J — ( 



dSP(6; s -> s')[a(S) - e - A a(-6)} = 0. (11) 



In addition, we must have that < a(S) < 1 since a is a probability^. 

The difficulty in using these formulas is that during the MC random walk, we do not know either P(5; s — > s') or 
A. Hence we must find a function a(6) which satisfies Eq. ( |ll| ) for all P(6) and A. 

To make progress we assume a particular form for P(5; s — * s'). In many interesting cases, the noise of the energy 
difference will be normally distributed. In fact the central limit theorem guarantees that the probability distribution 
of 5 will approach a normal distribution if the variance of the energy difference exists and one averages long enough. 
Given that (S) — A, the probability of getting a particular value of 5 is: 

P(S) = {2a 2 ir)- 1/2 exp(-(<5 - A) 2 /(2a 2 )). (12) 

In this section only, we will assume that we know the value of <r, that only A is unknown. We will discuss relaxing 
this assumption in Sec. IV. 



In the case of a normal distribution with known variance a we have found a very simple exact solution to Eq. (|l] 

a P (S- cr) =min(l,cxp(-5-(T 2 / 2 )) (13) 
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The uncertainty in the action just causes a reduction in the acceptance probability by an amount exp(— er 2 /2) for 
6 > — <t 2 /2. We refer to the quantity u = a 2 / 2 as the noise penalty. Clearly, the formula reverts to the usual 
Metropolis formula when the noise vanishes. 

To prove Eq. satisfies Eq. ([To|), one does the integrals in Eq. (Q) to obtain: 

A(A) = i[e~ A erfc(c(a 2 /2 - A)) + erfc(c(c7 2 /2 + A))] (14) 

where erfc(.z) is the complimentary error function and c = l/\/2a 2 . 

Below we apply Eq. (|l3|) to several simple problems and find that it indeed gives exact answers to statistical 
precision. The remainder of the paper concerns considerations of efficiency, a comparison to other methods and the 
more difficult problem of estimating a. 

III. OPTIMALITY 

The chief motivation for studying the effect of noise on a Markov process is for reasons of efficiency. If computer 
time were not an issue, we could average enough to reduce the noise level to an insignificant level. In this section we 
are concerned with the question of how to optimize the acceptance formula and the noise level. 

A. Acceptance ratio 

We first propose a measure of optimality of an acceptance formula and relate that to a linear programming problem. 
It is clear that Eq.(|ll]) can have multiple solutions; its solution set is convex. For example, if a(S) is a solution then 
so is Aa(^.Jor < A < 1. Even in the noise-less case, several acceptance formulas have been suggested in the 
literaturetao. To choose between various solutions we now discuss the efficiency of the -Markov process, namely the 
computer time needed to calculate a property to a given accuracy. It is a difficult problemEJ to determine the efficiency 
of a Markov chain but PeskunEll has shown that given two acceptance rules, a\(x) and a,2(x), if ai(A) > (22(A) for 
all A 7^ 0, then every property will be computed with a lower variance using rule 1 versus rule 2. Hence the most 
efficient simulation will have the maximum value of A. Very roughly what Peskun has shown is that it is always better 
to accept moves, other considerations being equal. 

We propose to call an optimal acceptance formula, one where the average probability of moving is as large as 
possible. Let W(8)d8 be the probability density of attempting a move with a change in action S, ( W(5) > 0.) In our 
definition an "optimal" formula will maximize: 

/oo 
dSW(S)(a(S)-a M (S)). (15) 
-00 

It is likely that the optimal functions are, to a large part, independent of W and so we set W(x) = 1. We subtracted 
&m(x), the Metropolis formula, so the integral would be convergent. Note that for the solution for a normal distribution 
ap{8) we have: £p = — ct 2 /2. . 

In the noise-less case one can easily showEil that the Metropolis formula is optimal. Without uncertainty, Eq. 
( |To| ) only couples values with the same \S\: a{5) = e~ s a{-5). For each 6 > 0, one needs to maximize: W(6)a(6) + 
W(—8)a(—S). This and the constraint < a(x) < 1 leads to the solution a(8) = 1 if 5 < 0. 

We conjecture that the formula Eq. ([D|) is nearly optimal; one argument is based on an analysis of the large and 
small 8 limits: the other is numerical. First, consider moves which are definitively uphill or downhill S 2 3> o 2 . We 
expect downhill moves will always be accepted for an optimal function, so A(A) — 1; this is its maximum value. 
Then from Eq. ([To]) A(A) = e~ A for A ^> a. Now we must invert Eq. (^). The unique continuous solution is 
a(S) — exo(— 8 — cr^/2) for 5 3> a. Hence, in the region |<5| 3> a the solution is optimal in the class of continuous 
functionsE 2 ! 

Another approach to finding the optimal solution to Eq. (^Tj) is numerical. We wish to maximize Eq. ([[5]) subject 
to equality constraints and the inequality constraints that a(S) be a probability. This is an infinite dimensional linear 
programming (LP) problem, a well-studied problem in optimization theory for which there exist methods to determine 
the globally optimal solution. To find such a solution, we represent a(8) on a finite basis. We used a uniform grid 
in the range —yto y and assumed that outside the range a(S) had the asymptotic form derived above. The discrete 
version of Eq. (Q) is Aj = ^ Kijai + Cj where Cj represents the contribution coming from \S\ > y and Kij = P(o~i\ Aj) 
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for the simplest quadrature. The problem is to find a solution maximizing ai subject to the inequalities: < a < 1 
and the equalities: 

X k - j ~ e-^Ki^ai = e-**c-i - cj. (16) 

i 

Fig. [l] shows the LP solution, for a — 1 compared with ap(5). Note that it is not a continuous function, but 
for the most part consists of regions with = 1 alternating with regions with a, = 0. The LP solution is a very 
accurate solution to the problem posed, with errors of less than 10~ 5 . The discontinuous nature of the LP solution 
is to be expected since the solution must lie on the vertices of the feasible region, determined by the equalities and 
inequalities. To obtain the solution to this difficult ill-conditioned problem, we discretized the values of 8 on a grid 
with spacing 0.01. However we only demanded that Eq. ( [io|) be satisfied on a grid A with a spacing of 0.2. This 
implies that there were 40 times as many degrees of freedom as equality constraints and thus most variables were free 
to reach the extreme values of and 1. 

The optimal LP function has a slightly larger value of £, roughly about i=s — 0.45c 2 versus the value for a p of 
£p = — 0.5er 2 . As far as we can determine, the LP solutions survive in the limit dr — > and are slightly more optimal 
than ap. However, given the inconvenience of determining and programming the LP solutions, and the very limited 
improvement in £, we see little reasoned to prefer such solutions. When we added a factor to penalize discontinuities 
in a(S) to the objective function proportional to J2i( a i ~ a i-i) 2 (this makes it a quadratic programming problem) 
then the solution converged to ap(d). 

B. Noise level 

Now let us consider how to optimize the noise level a. An energy difference with a large noise level can be computed 
quickly, but because of the penalty in Eq. ( |l3| ) it has a low acceptance ratio, reducing the overall efficiency of the 
simulation. We should pick a to minimize the variance of some property with the total computer time fixed. The 
computer time can be written as T — m(nt + to) where t is the time for an elementary evaluation of a given energy 
difference, n is the number of evaluations of 5 before an acceptance is tried, m is the total number of steps of the 
random walk and to is the CPU time in the noise-less part of the code. But the error in any property converges as 
e = c(er)m -1 / 2 where c is some function of a and the noise level converges as a = dn" 1 ! 2 where d is some constant. 
Eliminating the variables m and n, we write the MC inefficiency: 

C 1 = Te 2 - t Q c(a) 2 [fa- 2 + l] . (17) 

Here / = d 2 t/to, the relative noise parameter, is the CPU time needed to reduce the variance of the energy difference 
relative to the CPU time used in the noise-less part of the code: for / <C 1 noise is unimportant, for / ^ 1 computation 
of the noisy energy difference dominates the computer time. 

To demonstrate how important this optimization step is, we consider a one dimensional double well with a potential 
given by: 

k B TV(s) = a lS 2 + a 2 s 4 . (18) 

We picked parameters such the two minima are at s = ±4 and the height of the central peak is 7r(0)/7r(4) = 0.1, 
which corresponds to a\ = —0.288 and a 2 = 0.009. We used a uniform transition probability (T(s — > s')) with a 
maximum move step of 0.5. This means overcoming the barrier requires multiple steps, typical of an application 
which has a probability density with several competing minima. To measure the efficiency, we computed the error 
on the average value of (s k ) on Markov chains with 10 7 steps. We examined values of noise in the range < a < 6. 
We also calculated the density and compared to the exact values obtained by deterministic integration. Shown in 
Fig. [2] is the acceptance ratio versus a. We see that it decreases to zero rapidly at large noise levels. The dotted line 
(cx exp[— c 2 /8]) is the asymptotic form for large a. 

Fig. shows an example of the density obtained when the noise in the energy was a = 2. It is seen that ignoring 
the noise leads to a much smoother density than the exact result. Using the acceptance formula ap(5) we recover the 
exact result within statistical errors. 

Figure ^ shows the inefficiency (relative to its value when the noise is switched off) versus a and /. In general, 
as the difficulty of reducing the noise (as measured by /) increases, the calculation becomes less efficient, and the 
optimal value of a increases. The two panels show the efficiency of computing (s) and (s 2 ); the behavior of the error 
is quite different for even and odd moments of s because the error in the first moment is sensitive to the rate at which 
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the walk passes over the barrier, while the second is not. The flat behavior at large noise level of the first moment 
occurs because the noise actually helps passage over the barrier: for / > 3 a finite optimal value of a ceases to exist. 

On this example, we find that c(cr) cx exp(acr 2 ) with a w 0.09 for even moments and a « 0.025 for odd moments. 
With this assumption the optimal value of the noise level equals: 

<r* a = (//2)h/l + 2/(/a)-l]. (19) 

Although this formula is approximate (because of the assumption on c(er)) it does give reasonable values for the 
optimal a. 

As this example demonstrates, it is much more efficient to perform a simulation at large noise levels. One can 
quickly try very many moves even if most of them get rejected instead of just a few ones where the energy difference 
has been accurately computed. However, there are practical problems with using large a as will be discussed next. 



IV. UNCERTAIN ENERGY AND VARIANCE 



Unfortunately there is a serious complication: the variance needed in the noise penalty is also unknown. Both 
the change in energy and its variance need to be estimated from the data. The variance in general will depend on 
the particular transition: (s — > s'); we cannot assume it is independent of the configuration of the walk. Precise 
estimates of variance of the energy difference are even more difficult to obtain than of energy difference itself since 
the error is the second moment of the noise and will fluctuate more. In Fig.@ is shown the effect on the double well 
example of using an estimate of the variance in the penalty formula instead of the true variance. The systematic error 
arises because the acceptance rate formula is a non-linear function of the variance. We will see that we must add an 
additional penalty for estimating the variance from the data. 

Let us suppose we generate n estimates of the change in action: {yi, . . . ,y n } where each y^ is assumed to be an 
independent normal variate with mean and variance: 

fa) - A (20) 

(( Vl -A) 2 )=na 2 . (21) 

Unbiased estimates of A and a 2 are: 

8 = ^ i=1 Vl (22) 



X 2 = (23) 
n{n — 1) 

By construction (8) — A and (x 2 ) = cr 2 - 

The joint probability distribution function of 8 and y2 is the product of a normal distribution for the mean and a 
chi-squared distribution for the variance: 

P(S,x 2 ;A,a)=P(S-A,a)P n ^( X 2 ;a) (24) 

where P(8 — A, er)is given in Eq.(|l2|) and 

^n-i(xV) = c nX n - 3 e-™ 2 /° 2 (25) 

with fj, = (n — l)/2 and 

( f ,/a 2 r 



(26) 



The generalization from the previous section is straightforward. The acceptance probability can only depend on 
the estimators 8 and x 2 . The average acceptance probability is: 



/OO /-OO 
dS d X 2 P(<5,x 2 ;A,a)a(<5, 
-oo JO 



X 2 )- (27) 



G 



Detailed balance requires: 



A(A, cr) = exp(-A)A(-A, a) (28) 

for all values of A and a > 0. We have two parameters to estimate and average over instead of one and a two 
dimensional homogeneous integral equation for a(S, x 2 )- 

In the limit of enough independent evaluations we recover the one parameter equation since linin^oo P„_i(x 2 ) = 
S(x 2 — cr 2 ) and the equations for different cr's decouple. 



Asymptotic Solution 



We can do the same type of analysis at large |A| as we did when a was known. A move is definitely uphill or 
downhill if S 2 ^> x 2 ■ Assume there exists a solution with A(A,a) = 1 for A <C — cr. Then A(A,a) = exp(— A) 
for A 3> cr. Assume this solution can be expanded in a power series in % 2 , a(8, x 2 ) = S^Lo ^kX e ~ S ■ Explicitly 
performing the integrals we obtain: 



exp(-cr 2 /2) = c nhT{fi + k)(cr 2 /ri) 



M+fc 



(29) 



Matching terms in powers of a 2 we obtain b}~. The expansion can be summed to obtain a Bessel function: 



a(6,x 2 ) = T(Li)e- 



(M-l)/2 



(30) 



This function is positive for x 2 < n/A. For larger values of x 2 either the assumption of A(A, cr) = 1 is wrong or no 
smooth solution exists. 

Taking the logarithm of the power series expansion, we obtain a convenient asymptotic form for the penalty in 
powers of r\ = x 2 / n: 



ub 



X 



X 



X 



4(n+l) 3(n + l)(n + 3) 



(31) 



The "Bessel" acceptance formula is: 



a B (<5, x 2 , n) = min(l,exp(-5 - u B )) 



(32) 



The first term x 2 /2, is the penalty in the case where we know the variance. The error in the error causes an 
additional penalty equal, in lowest order, to % 4 /(4ri). This asymptotic form should only be used for small values of r\ 
since the expansion is not convergent for r\ > 1/4. In Fig. ||we show errors in the detailed balance ratio as a function 
of A and cr for n = 128. It is seen that the errors are small but rapidly increasing as a function of cr. We find that the 
maximum relative error in the detailed balance ratio approximately equal to 0.15?7 2 . Good MC work will have the 
error less than 10 -3 requiring 77 < 0.1 Very accurate MC work with errors of less that 10~ 4 requires a ratio 77 < 0.02. 
This is a limitation on the noise level. 

As an example, we have calculated the deviation of the energy from its exact value for the double well potential. 
The results for the relative error in the energy are shown in Figure 6 for several values of n and cr. As we expect, the 
error in the energy depends only on 77 and is proportional to rj 2 . We also see that the estimates of limits on the noise 
level given above are correct. There is a dip at n = 64 for ?; 1=3 .5, beyond the region where the Bessel expansion is 
convergent. 

Figure [| shows the effect on the efficiency of the additional noise penalty. While the effect on the even moments 
is small, the efficiency of the first moment dramatically increases for noise levels a > 2, perhaps because rejections 
for large dispersions of the energy differences cause difficulty in crossing the barrier. The efficiency becomes more 
sensitive to a. 

We have not found an exact solution for Eq. (|28|). From numerical searches it is clear that much more accurate 
solutions exist than the asymptotic form. We have found such piecewise exponential forms. But the Bessel formula 
is a practical way of achieving detailed balance if one can generate enough independent normally distributed data. 



7 



V. DEVIATIONS FROM A NORMAL DISTRIBUTION 



We have assumed that S is normally distributed. In the case the noise is independent of position but otherwise 
completely general, we can perform the asymptotic analysis. Let us assume that: 



A(A) = J d5P{8 - A)a(S) (33) 

and that A(A) — 1 for sufficiently negative values of A. Then for large values of A the unique continuous solution is: 

a(5) = exp(-<5 - u). (34) 
The penalty u has an expansion in terms of the cumulants of P{S): 



oo 

u = 

n=2,4 



oo /"°° 

Kn/n! = -In(/ dxP(x)e~ x ). (35) 

_n A J — OO 



The odd cumulants vanish because P(x) = P(—x). For the normal distribution this reduces to Eq. ( jl3| ) and the 
penalty form is exact. The contribution of higher order cumulants could be either positive or negative leading to 
positive or negative penalties. 

Eq. ([35]) illustrates a limitation of the penalty method: one can not allow the energy difference to have a long 
tail of large values. It is important that the energy difference be bounded because a penalty can be defined only if 
lmij^oo e x P(x) = so the integral will exist. Suppose the energy difference in Eq. (|^) is a sum of an inverse power 
of the distances to the other particles A = rj m and that r is sampled uniformly. Then we find (in 3 dimensions) 

that at large 5: P(S) cx S-^ m . For any positive value of m the higher order cumulants and the penalty will not exist 
even though the mean and variance of S exist under the weaker condition: m < 3/2. We must arrange things so 
that large deviations of the energy difference from the exact value are non-existent or exponentially rare, perhaps by 
bounding the energy error. 



VI. COMPARISON WITH OTHER METHODS 



A. Method of Kennedy, Kuti, and Bhanot 



Kennedy, Kuti and Bhanotli3E3 (KKB) have introduced a noisy MC algorithm for lattice gauge theory. We adapted 
that method for the present application by using energy differences with respect to an approximate potential, u>(s), 
that can be determined quickly and exactly. Proposed moves are "pre-rejected" using w(s) and then the more 
expensive computation of an estimate of v (s) is done. Let us suppose that the deviation between these potentials can 
be bounded: max |<5w(s) — Sv(s)\ < e for some e. We determine an unbiased estimate of the ratio q needed in Eq. (Q) 
by using the power series expansion: 



q(s 



,-<5 



n=0 



(36) 



where S(s — > s') = v(s') — w(s') — v(s) + w(s). With a predefined probability we sample terms in the power series 
up to order n and obtain an estimate of q; this is a variant of the von Neumann-Ulam method We finally accept the 
move with probability 



a =(1 + <z)/(2 + e). (37) 

For an appropriate choice of parameters, a is in the range < a < 1 most of the time. The revised KKB method is 
given by the following pseudo-code: 



Sample s' from T(s — > s') 

If (exp [— w(s') + w(s)} < prn) then 

reject move 
else 

Qo = h = 1 
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do n = 1 , oo 

p n = min(7/n, 1) 

if (Pn < prn) exit loop 

sample x n = —v(s') + w(s') + v(s) — w(s) 

tn tri—l^n / (yiPn) 
In = Qn-1 + t n 

end do 

if [(1 + q n )/(2 + e) > prn] then accept move 



In this procedure 7 > is a parameter which controls the number of terms sampled. For 7 < 1 the average 
number of evaluations of x per step is n e (e 7 — 1), where n e is the acceptance ratio of the preliminary rejection step. 
Each sample of x must be uncorrelated with previous samples. As e — > 0, one recovers the Metropolis algorithm. 
The sampling distribution, 7, and e have to be fixed to ensure that a is in the interval [0, 1] almost all of the time. 
Violations for which a < put a limit on the size of the noise and the size of the sampling step, while e can be. 
made arbitrarily large to remove violations where a > 1. This will, however, affect the efficiency. A recent preprinto 
proposed to solve the problem of violating the constraints on the acceptance probabilities by introducing negative 
signs into the estimators. We have not explored this possibility. 

We made a comparison to the penalty method with the double well potential, using w(s) = <Z2S 4 as the approximate 
potential. (It confines the random walk but does not have the central barrier.) For a violation level of 1CP 4 , the 
maximum noise was a = 0.4. This is a much smaller noise level than is optimal in the penalty method. For this 
noise, a transition step of 0.45 was optimal. To optimize 7 and e, we first adjusted 7 until the half the desired number 
violations occurred for a < 0. Then we adjusted e until the total number of violations equaled 10 -4 . The errors in 
the first and second moments are given in Table [{], along with the parameters used in the KKB algorithm. We find 
that the KKB method is 2.3 times slower for the first moment and 3.5 times slower for the second moment than the 
penalty method (run at the same noise level, with the same transition step size and computing the variance with 
71 = 32 points). This comparison was done assuming / is sufficiently small that we do not have to take into account 
the multiple evaluations of the energy differences. Taking that into account would raise the inefficiency of the KKB 
method by another factor of 2.74, the average number of function evaluations. 

We also tested the KKB method with w(s) = v(s) (i.e., the argument of the exponential was only noise). The 
data for this case is also given in Table @. The maximum value of allowable noise was still a = 0.4. For a < 0.2, 
the average number of function evaluations was less than one, making the method more efficient than the penalty 
method, for a fixed noise level. For the first moment, KKB was 3.4 times more efficient for a = 0.1 and 1.3 times 
more efficient for a = 0.2. However, if we consider optimizing a as in Sec. Ill B| , the KKB method is less efficient 
than the penalty method. To be efficient at large values of /, larger values of a must be used, and there the KKB 
method is less efficient. At small values of /, the last term in Eq. ( |T7j ) dominates, and the lesser number of function 
evaluations yields no advantage for the KKB method. 

The KKB method requires taking enough samples to lower the noise to an acceptable value. In contrast, the 
penalty method requires taking enough samples to ensure the distribution is normal. Also, for this problem, the 
penalty method could have an even higher efficiency because it could use larger sampling steps sizes (the maximum 
KKB sampling step size depends on the quality of the approximate function, w). The advantage of the KKB method 
is that it makes no assumptions about the normalcy of the noise; the disadvantage is that one cannot guarantee that 
a is in the range [0, 1]. Knowledge that the noise is normally distributed allows one to use a much more efficient 
method. 



B. Reweighting 

Another alternative noisy MC method is to combine the stochastic evaluation of an exponential with the reweighting 
method. One can perform a simulation with w(s), generating a random walk {si}. Then an exact average can be 
generated by reweighting: 

(O) = El £ { Q Ql (38) 

where Qi exp(— (u(sj) — w(si))/ksT). As discussed above an estimate of Qi can be generated with the von Neumann- 
Ulam procedure by stochastically summing the power series expansion of the exponential. In this case we do not care 
whether the exponential is between and 1, only its variance is important. The difficulty is that the exponent of the 
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weight increases linearly with the size of the system, i.e. ((v(s)-w(s)) 2 ) oc N. Hence the variance of (O) will increase 
exponentially with the size of the system. This method is only appropriate for small systems, but no assumptions are 
made about the distribution of v(s) — w(s). The advantage of including the noise in the random walk rather than 
reweighting the visited states is that one works with energy differences only and it is possible to make the fluctuations 
of differences independent of the size of the system. 



VII. CONCLUSION 



We have shown a small modification of the usual random walk method by applying a penalty to the energy difference 
can compensate for noise in the computation of the energy difference. If the noise is normally distributed with a known 
variance, the compensation is exact. If one estimates the variance from n data points, we show that it suffices to have 
X 2 < O.ln and apply an additional penalty. On a double well potential we found that the the optimal noise level is 
typically k B T < a < 3k B T. 

The penalty method utilizes the power of Monte Carlo: one can choose the transition rules to obey detailed 
balance and to optimize convergence and use only well-controlled approximations. We can generalize to other noise 
distributions by using numerical solutions to the detailed balance equations as we have shown. We have adapted a 
method introduced by Kennedy et al&B but found it to be much slower once the noise level becomes high. 

We now plan to apply the algorithm to a serious application. As we have shown, very large gains in efficiency 
are sometimes possible. However, the problem remains of ensuring that the estimates of the energy differences are 
statistically independent and normally distributed. 

Codes used in calculations reported here are available at: ftittp: / /www. ncsa.uiuc.edu/Apps/CMP/index. html ] 
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FIG. 1. The optimal acceptance formula computed using the linear programming method (solid line) and using the penalty 
method (dotted line). Both are for a = 1. The accuracy of the LP solution is better than 1 part in 10 . 

FIG. 2. The logarithm of the acceptance ratio as a function of a 2 for the double well potential. The dashed line is proportional 
to exp(— cr 2 /2) and the dotted line to exp(— cr 2 /8) 

FIG. 3. The density as computed using the Metropolis formula (dotted line), the direct penalty (dashed line), and the Bessel 
penalty with N = 16 (solid line). In all cases the noise level was a = 2. 

FIG. 4. The relative in-efBciency of penalty MC as a function of a and the noise level, /. From bottom to top the values of 
/ are 0.5, 1, 2, 4. The solid lines are assuming the noise is known, the dashed lines are using the Bessel formula with n — 64 
independent evaluations. Fig. (4a) is the first moment, (4b) is the second moment. 



FIG. 5. The log of the detailed balance ratio versus A using the Bessel penalty in Eq. (32) with n = 128 points to estimate 
the variance. From top to bottom the curves are with noise levels of a — 2, 1.8, 1.6, 1.4, and 1.2. 

FIG. 6. The relative error in the energy for a double well potential versus r\ for several values of n. The circles are for n = 8, 
the diamonds are for n — 16, the squares are for n = 32, and the triangles are for n = 64. The dashed line has a slope of two. 
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TABLE I. Computed MC inefficiencies for the modified KKB method and the penalty method, n is the average number of 
function evaluations per step. The Bessel penalty method uses 32 data points and a transition step of 0.45. 
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